AD-A068  724 


UNCLASSIFIED 


HARVARD  UNI V CAMBRIDGE  MA  CENTER  ON  DECISION  AND  CON — ETC  F/G  12/1 
MONITORING  COOPERATIVE  AGREEMENTS  BETWEEN  PRINCIPALS  AND  AGENTS— ETC<U) 
FEB  79  R RADNER  N00014-77-C-0533 


I 


/ OF  / 
#888724 


HPI  gpgppag 


MONITORING  COOPERATIVE 
AGREEMENTS  BETWEEN 
PRINCIPALS  AND  AGENTS 


Roy  Radner 


Technical  Report  No.  3 


D D C 

fn>ECS-ecrLf7rai 


Prepared  under  Contract  No.  N00014-77-C-0533 
Project  No.  NR  277-240 

for  the  Office  of  Naval  Research 


This  document  has  been  approved  for  public  release 
and  sale;  its  distribution  is  unlimited. 

Reproduction  in  whole  or  part  is  permitted  for  any 
purpose  of  the  United  States  Government. 


Harvard  University 
Littauer  #308 
Cambridge,  Mass.  02138 


February,  1979 


MONITORING  COOPERATIVE  AGREEMENTS  BETWEEN 


PRINCIPALS  AND  AGENTS 

Roy  Radner 

1.  Introduction^ 


Theories  of  agency  and  of  the  design  of  incentives  in  organizations  typi- 
cally portray  the  members  of  the  organization  as  players  in  a noncooperative 
game.  The  predictive  theory  that  naturally  accompanies  this  point  of  view  is 
that  of  Nash  equilibria,  including  Harsanyi's  elaboration  of  that  theory  to 
accommodate  situations  in  which  the  players  have  incomplete  information  about 
the  parameters  of  the  game. 

On  the  other  hand,  much  normative  theory  of  organizations  uses  the  frame- 
work of  cooperative  game  theory,  with  its  array  of  alternative  "solution" 
concepts  (value,  core,  von  Neumann-Morgenstern  solution,  Nash  bargaining  solu- 
tion, etc.).  Furthermore,  empirical  observations  of  organizations  reveal  wide- 
spread cooperative  behavior,  as  well  as  noncooperative  behavior,  so  that  coop- 
erative game  theory  may  have  descriptive  as  well  as  normative  value. 

What  determines  whether  members  of  an  organization  cooperate  or  not? 
Conventional  wisdom  suggests  that  cooperation  is  less  likely — or  less  stable — 
the  more  players  there  are,  or  the  greater  the  difficulty  of  communication 
among  the  players;  cooperation  is  more  likely  (stable?)  if  there  are  mechanisms 
whereby  the  players  make  binding  commitments.  Thus  theories  of  industrial 

^This  research  was  supported  by  the  Office  of  Naval  Research,  Contract 
-.No.  N00014-77-C-0533  ^nd  by  the  National  Science  Foundation  (Grant  SOC76-14768 
to  the  University of  California.  Bfefttdlev).  2T  preliminary  version  of  this  ■ — 
paper  was  presented  at  the  CEME-NBER  Conference  on  Decentralization,  University 
of  California,  San  Diego,  Feb.  23-25,  1979. 
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organization  typically  assume  that  when  the  number  of  firms  in  an  industry  is 
"large"  the  resulting  equilibrium  will  be  of  the  noncooperative  type,  whereas 
when  the  number  of  firms  is  "small"  the  outcome  may  be  cooperative  (collusive) . 

The  theory  of  repeated  games  explores  in  a formal  way  another  piece  of 
conventional  wisdom,  namely  that  when  members  of  an  organization  have  long- 
lasting  relationships  they  can  encourage  and  maintain  cooperative  behavior 
(without  the  device  of  binding  commitments)  by  signalling  intentions  to  coop- 
erate and  by  punishing  defectors  from  informal  agreements.  Indeed,  the  theory 
of  repeated  games  provides  conditions  under  which  noncooperative  equilibria  of 
the  entire  sequential  game  can  produce  cooperative  outcomes  of  the  component 
subgames . 

Unfortunately,  such  results  seem  to  require  an  infinite  number  of  repeti- 
tions of  the  subgame;  they  are  not  valid  for  a finite  number  of  repetitions, 
no  matter  how  large  that  finite  number.  However,  similar  results  can  be 
obtained  for  approximate  noncooperative  equilibria  in  the  finite-repetitions 
case;  such  an  approximate  equilibrium  is  called  an  epsilon-equilibrium  if  each 
player's  sequential  strategy  is  within  epsilon  (in  utility)  of  being  the  best 
response  to  the  other  players'  strategies.  Thus,  one  gets  the  result  that,  for 
any  fixed  positive  epsilon,  if  the  number  of  repetitions  is  large  enough  then 
there  are  noncooperative  epsilon-equilibria  that  have  cooperative  outcomes  in 
each  subgame.  In  a sense,  in  finite  repetitions  of  a game,  the  best  is  the 
enemy  of  the  good! 

In  the  principal-agent  model,  the  agent  observes  a (random)  environmental 
variable  and  then  chooses  an  action;  this  leads  to  an  outcome  that  depends  on 
both  the  action  and  the  environment.  The  principal  observes  this  outcome  (but 
neither  the  agent's  action  nor  the  environment),  and  pays  the  agent  according 
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to  a previously  announced  reward  function,  which  depends  on  the  outcome  only. 

In  equilibria  of  repeated  games  that  sustain  cooperative  behavior,  each 

player  is  "punished"  by  the  others  for  departures  from  the  informal  agreement 
2 

to  cooperate.  However,  in  the  principal -agent  situation,  the  principal  cannot 
observe  the  agent's  behavior  directly,  but  only  the  consequences  of  his  beha- 
vior, and  those  consequences  are  also  influenced  by  the  environment.  Therefore, 
if  cooperative  agreements  are  to  be  sustained  as  equilibria  of  the  repeated 
game,  the  principal  must  have  some  statistical  method  of  detecting  "cheating" 
by  the  agent  rapidly  enough  to  deter  him  from  doing  so;  on  the  other  hand,  this 

method  should  have  a very  low  probability  of  triggering  false  alarms.  The  main 

3 

theorem  of  this  paper  (Sec.  5)  shows  that  this  is  possible. 

In  Sections  2 and  3, I present  the  principal-agent  model  in  the  form  of  a 
one-period  game,  and  state  a few  of  its  properties.  In  Section  4 I review  the 
essential  concepts  in  the  theory  of  epsilon-equilibria  of  finitely  repeated 
games.  Section  5 contains  the  main  result  on  the  existence  of  epsilon-equilibria 
in  T-period  repetitions  of  the  principal-agent  game,  when  T is  large  (but 


finite).  The  proof  is  constructive,  and  exhibits  a family  of  epsilon-equilibrium 
strategy  pairs.  Using  this  family  of  strategy  pairs  one  can  approach 


An  early  important  paper  on  repeated  games  (supergames)  is  by  Aumann 
(1959).  Characterizations  of  perfect  Nash  equilibria  in  infinite  supergames 
have  been  provided  by  Aumann  and  Shapley  (unpublished)  and  by  Rubinstein  (1977). 
For  an  analysis  of  altruism  in  the  context  of  infinite  supergames  see  Kurz 
(1978).  Examples  of  epsilon-equilibria  of  finite  supergames  have  been  studied 
by  Radner  (1979a,  1979b). 

3 

The  main  theorem  uses,  among  other  facts  of  probability  theory,  the  law 
of  the  iterated  logarithm,  and  is  related  to  sequential  tests  of  hypotheses 
that  have  power  one  (see  Robbins  and  Siegmund,  1974,  and  the  references  given 
there).  Since  the  research  for  the  present  paper  was  completed,  I had  the 
opportunity  to  see  an  unpublished  paper  by  A.  Rubinstein  (1978),  in  which  he 
uses  the  law  of  the  Iterated  logarithm  to  demonstrate  the  existence  of  Nash 
equilibria  with  close  to  Pareto  optimal  average  expected  utility  in  an  example 
of  an  infinite  supergame. 
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arbitrarily  close,  in  terms  of  average  expected  utility  per  period,  to  any 
one-period  cooperative  arrangement  that  dominates  a one-period  Nash  equilibrium. 
Section  6 indicates  some  extensions  of  the  theory. 
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2.  A Model  of  a Sequential  Principal-Agent  Relationship 

Consider  a principal-agent  relationship  that  lasts.  T periods.  In  period 

t,  the  agent's  action  is  A , a number  between  0 and  M (a  positive  parameter). 

t a 

The  outcome  of  the  agent's  action  is 
Ct  = Y(At,  Zt)  , 

where  Z^  is  an  exogenous  random  variable  (the  "state  of  nature"  in  period  t). 
We  may  interpret  the  variable  At  as  a measure  of  the  agent's  effort.  The 
principal  observes  the  outcome  of  the  agent's  action,  and  pays  the  agent  W^. 
The  resulting  one-period  utility  to  the  agent  is  U(Wt>  At>,  where  the  function 
U is  strictly  concave,  increasing  in  W,  and  decreasing  in  A.  The  one-period 
utility  to  the  principal  is  assumed  to  be  a linear  function  of  the  outcome 
and  the  payment  to  the  agent,  increasing  in  the  former  and  decreasing  in  the 
latter.  By  a suitable  choice  of  units  one  can  express  the  principal's 
utility  as  Ct  - Wfc.  The  agent  can  observe  the  state  of  nature,  A^ , before 
taking  action,  but  the  principal  can  observe  only  the  resulting  outcome,  Ct« 
Assume  that  the  functions  U and  y are  continuously  differentiable,  that 
for  every  Z the  function  y(*,Z)  is  concave  and  increasing  in  its  first  argu- 
ment (the  agent's  action),  and  that  the  partial  derivative  of  y with  respect 
to  the  agent's  action  is  bounded  away  from  0,  uniformly  in  Z,  say  = M'  >0. 

Notice  that  I have  assumed  that  the  agent  is  risk-averse,  whereas  the 
principal  is  risk-neutral.  The  main  theorem  (Section  5)  can  easily  be  extended 
to  the  case  in  which  the  principal  is  risk-averse;  see  Section  6. 


_____ 
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3.  The  One-Period  Game 

In  this  section  I review  the  usual  formulation  of  the  principal-agent 

4 

relationship  as  a one-period  noncooperative  game.  I therefore  omit  the 
subscript  t on  all  the  variables.  The  principal's  (pure)  strategy  is  a reward 
function  a>  that  determines  the  payment  to  the  agent  as  a function  of  the 
outcome  of  the  agent's  action  : 

W = a)  (C) 

Given  the  reward  function  w,  the  agent  chooses  a decision  function  a that 
determines  his  action  as  a function  of  the  state  of  nature: 

A = a(Z) 

The  expected  utility  to  the  agent  is 

,z]\  a(Z)  } 

and  the  expected  utility  to  the  principal  is 
£y[a(Z),  z]  - <?a>(y[a(Z)  , z]  ) 

This  is  in  fact  a two-move  game  with  perfect  information,  in  which  the  prin- 
cipal moves  first,  choosing  the  reward  function,  and  the  agent  moves  second, 
choosing  the  decision  function.  The  noncooperative  solution  to  the  game  is 
taken  to  be  a Nash  equilibrium. 

Recall  that  a pair  (u>,  a)  of  functions  is  Pareto-optimal  if  there  is  no 
other  pair  that  yields  each  player  at  least  as  high  an  expected  utility,  and 
yields  at  least  one  of  the  players  strictly  more. 

Note  that  the  decision  function  a is  a move,  not  a strategy. 

The  agent's  strategy  is  a mapping  from  reward  functions  a)  to  decision  func- 
tions a,  since  the  agent  learns  the  reward  function  before  choosing  the 
decision  function. 

4 

For  material  on  the  principal-agent  problem,  see  Shavell  (1978)  and  the 
references  cited  there.  For  a more  general  organizational  setting  of  the 
problem,  see  Groves  (1973). 


For  the  purposes  of  this  paper,  the  important  characteristics  of  Nash 


equilibria  and  Pareto-optima  of  the  one-period  game  are  summarized  as  follows. 
Proposition 

(1)  In  a Nash  equilibrium,  the  reward  function  must  be  strictly  increa- 
sing on  the  set  of  realizable  outcomes,  i.e.,  on  the  range  of  y[o(*)»*J. 

(2)  In  a Pareto  optimum,  the  reward  function  must  be  constant  on  the 
set  of  realizable  outcomes;  hence 

(3)  A Nash  equilibrium  cannot  be  Pareto-optimal. 

Note  that  a consequence  of  property  (2)  is  that,  if  (w,  a)  is  Pare to -optimal, 
then  the  agent's  best  response  to  the  (constant)  reward  function  <d  is  to 
always  set  his  action  equal  to  0.  In  other  words,  with  a reward  that  is  inde- 
pendent of  the  outcome  of  the  agent's  action,  he  has  an  incentive  to  reduce 
his  effort  below  the  level  called  for  by  the  decision  function  a. 
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4.  Epsilon-Equilibria  of  Repeated  Games 


Suppose  now  that  the  one-period  game  is  repeated  T times  (T  finite);  the 
resulting  sequential  game  will  be  called  the  T-period  game.  Assume  that  the 
utility  to  a player  is  the  average  of  the  T one-period  expected  utilities.  A 
pure  sequential  strategy  for  a player  is  a sequence  of  functions,  one  for 
each  period;  the  function  for  period  t determines  the  player's  one-period 
strategy  in  period  t as  a function  of  all  of  the  information  available  to  the 
player  up  to  that  period.  A Nash  equilibrium  of  the  sequential  (T-period) 
game  is  a pair  of  sequential  strategies  such  that  each  player's  sequential 
strategy  is  a best  response  to  the  other  player's  sequential  strategy.  Equil- 
ibrium pairs  of  strategies  will  typically  involve  threats  of  "punishment"  by 
one  player  if  the  other  player  departs  from  some  prescribed  sequential  strategy. 

The  concept  of  perfect  equilibrium  of  the  T-period  game  has  been  intro- 
duced by  Selten  (1975)  to  rule  out  equilibria  in  which  the  players  use  threats 
that  are  not  "credible."  For  any  date  and  any  history  of  observations  up  to 
that  date,  a player's  sequential  strategy  determines  a sequential  strategy 
for  the  remaining  T-t+1  periods  of  play,  which  we  may  call  the  continuation 
of  the  original  sequential  strategy,  given  the  period  and  the  history  of 
observations  prior  to  that  period.  A pair  of  sequential  strategies  is  a 
perfect  Nash  equilibrium  of  the  T-period  game  if,  for  every  period  t and  every 
history  of  prior  observations,  the  respective  continuations  form  a Nash 
equilibrium  of  the  remaining  (T-t+1) -period  game.  Note  that  in  the  definition 
of  a perfect  Nash  equilibrium  one  must  test,  for  each  period  t,  whether  the 
pair  of  continuations  is  a Nash  equilibrium  for  all  possible  pairs  of  prior 
histories  of  observations,  not  just  those  that  would  be  produced  by  the  original 
strategy  pair. 
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In  games  in  which  each  player  can  observe  the  other  player's  one-period 
strategies  (and  not  just  the  consequer  »s  of  action),  one  can  show  that  in 
every  perfect  Nash  equilibrium  of  the  T-period  game,  the  strategy  pair  used 
in  every  period  is  a Nash  equilibrium  of  the  one-period  game.  On  the  other 
hand,  one  can  show  that,  if  T is  infinite,  there  are  perfect  equilibria  of 
the  sequential  game  that  result  in  the  use  of  "cooperative  " pairs  of  strategies 
in  each  one-period  game,  and  in  particular  in  the  use  of  Pareto-optimal  pairs 
of  strategies.^  This  discontinuity  at  infinity  motivates  the  definition  of 
epsilon-equilibria  in  the  T-period  game  (T  finite).  (See  Radner,  1979a  and 
1979b.)  For  any  positive  number  epsilon,  an  epsilon-equilibrium  is  a pair  of 
strategies  such  that  each  player's  strategy  is  within  epsilon  in  average 
expected  utility  of  being  a best  response  to  the  other  player's  strategy. 

The  concept  of  perfect  Nash  equilibrium  can  be  extended  to  epsilon-equilibria 
as  follows.  A sequential  strategy  pair  is  a perfect  epsilon-equilibrium  if, 
for  every  period  t and  every  history  of  prior  observations,  the  continuation 
of  each  player's  strategy  is  within  epsilon  of  being  the  best  response  to 
the  corresponding  continuation  of  the  other  player's  strategy.  In  this  defi- 
nition, the  utility  of  a continuation  of  a strategy  is  the  average  of  the 
player's  expected  utilities  in  all  T periods.  (For  an  alternative  definition, 
see  Section  6. ) 

For  games  in  which  each  player  can  observe  the  other  player's  one-period 
strategies,  one  can  show  that,  for  any  positive  epsilon,  if  T is  sufficiently 
large  then  there  are  perfect  epsilon-equilibria  of  the  T-period  game  that 
result  in  Pareto-optimal  strategy  pairs  in  each  one-period  game.  In  other 


R.  Aumann  and  L.  Shapley,  unpublished. 
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words,  for  perfect  epsilon-equilibria,  infinite-horizon  giaes  are  approximated 
well  by  long  finite-horizon  games.  ^ 

Cooperative  one-period  strategies  can  be  sustained  in  perfect  epsilon- 

* * 

equilibria  of  the  T-period  game  by  "trigger  strategies,"  Let  (s^,  s^)  be  a 

Nash  equilibrium  of  the  one-period  game,  and  let  (s^,  s^)  be  a Pareto-superior 

pair  of  one-period  strategies.  A trigger  strategy  for  player  1 is  defined  as 

follows:  player  1 plays  strategy  s^  as  long  as  player  2 plays  strategy  s£; 

* 

thereafter  player  1 plays  s^.  The  best  response  by  player  2 to  this  trigger 
strategy  is  to  play  s0  until  the  last  period,  and  then  play  a best  response 
to  S2*  However,  the  gain  in  average  per-period  utility  of  doing  this,  over 
using  the  corresponding  trigger  strategy,  will  be  small  if  T is  large. 

The  efficacy  of  such  simple  trigger  strategies  in  sustaining  perfect 
epsilon-equilibria  of  the  T-period  game  depends  on  each  player  being  able  to 
rapidly  detect  departures  from  the  cooperative  strategies.  In  the  principal- 
agent  situation  considered  in  this  paper,  the  principal  cannot  observe  the 
agent's  actions  directly,  but  only  the  consequences  of  his  actions,  and  these 
consequences  also  depend  on  a random  state  of  nature.  Therefore,  if  coopera- 
tive arrangements  are  to  be  sustained  as  equilibria  of  the  T-period  game,  the 
principal  must  have  available  some  more  powerful  method  of  detecting  "cheating" 
by  the  agent  rapidly  enough  to  reduce  the  agent’s  incentive  to  cheat  to  negli- 
gible levels.  That  such  a method  exists  is  shown  in  the  next  section. 


These  results  are  illustrated  in  (Radner,  1979a,  1979b).  A more 
general  treatment  of  epsilon-equilibria  will  be  presented  in  a forthcoming 
paper. 
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5.  Epsllon-Equlllbrla  of  the  T-Perlod  Principal-Agent  Game 
* * 

Let  (u>  , a ) be  a Nash  equilibrium  of  the  one-period  principal-agent 
game,  and  let  (w,  a)  be  a Pareto-superior  pair,  where  w is  constant. 

In  this  section  I shall  exhibit  a class  of  perfect  epsilon-equilibria  of  the 
T-period  game,  using  trigger- type  strategies. 

Defining  a trigger  strategy  for  the  agent  presents  no  problem;  the  agent 
uses  the  decision  function  a until  the  first  time  the  principal  does  not  use 
the  constant  reward  w,  and  then  optimizes  against  the  announced  reward  func- 
tions from  that  period  on.  I shall  denote  this  strategy  by  a . 

A 

It  is  important  to  emphasize  at  this  point  that  in  each  one-period  game 
the  principal's  action  is  an  announcement  of  a reward  function,  and  he  is 
required  to  use  that  reward  function  for  that  period.  The  agent  then  observes 
the  current  Z t,  and  takes  an  action,  A^. 

Defining  a suitable  trigger  strategy  for  the  principal  is  more  difficult. 

In  each  period  t,  based  on  the  history  of  outcomes  C^,...,  C^,  the  principal 

must  decide  whether  to  make  the  payment  w or  to  switch  to  the  Nash  equilibrium 
* 

reward  function  w . If  his  switching  rule  is  too  lax,  then  the  agent  may  be 
able  to  accumulate  a large  enough  extra  expected  utility  by  cheating  before 
getting  caught  so  as  to  make  cheating  attractive.  On  the  other  hand,  if  the 
switching  rule  is  too  strict  (too  "trigger  happy"!),  then  there  will  be  a 
substantial  probability  that  the  principal  will  switch  to  the  Nash  equilibrium 
reward  function  before  the  agent  ever  starts  cheating. 

For  the  remainder  of  this  paper,  assume  that  the  states  of  nature,  Z^, 
are  independently  and  identically  distributed,  and  bounded.  Define 
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thus  Ct  is  the  realized  consequence  in  period  t if  the  agent  uses  the  deci- 

A A 

sion  rule  o.  The  are  independently  and  identically  distributed,  and 
bounded;  let  c denote  the  expected  value  of  C . 


Whatever  the  sequential  strategies  actually  used  by  the  players,  let  C 


denote  the  realized  outcome  in  period  t,  and  let  S = C,  +. . .+  C . The  C 

n 1 n t 

are  bounded,  say  by  M.  Let  (bn)  be  a strictly  increasing  sequence  of  posi- 
tive numbers  (n  » 1),  and  define  the  random  variables  N and  N by: 


& = min  {n  = 1 : S - nc  = -b  } , 


n 


(5.1) 


N = min  (Sf,  T}  . 


Consider  the  following  trigger  strategy  for  the  principal:  pay  the  agent  w 


in  each  period  1 through  N,  and  thereafter  use  the  reward  function  u>  . I 


shall  denote  this  strategy  by  Op((bn)). 


Define  S = C,  +. . .+  C . If  the  agent  uses  some  sequential  strategy 
n l n 


other  than  o^,  then  the  principal's  loss  (if  any)  during  periods  1 through  N 


is 


S SN  " SN 


Lemma  1.  If  the  principal  uses  the  trigger  strategy  a ((b  )),  then  a bound 


on  his  expected  los^  during  periods  1 through  N is  given  by 


^ + M - t>T  + M 


Proof.  (S  - nc)  is  a martingale,  and  N is  a bounded  stopping  time.  Hence, 


n 


by  the  systems  theorem  for  martingales  (see,  e.g.,  Chung,  1974), 


Although  it  is  convenient  to  interpret  1^  as  the  principal’s  loss,  this 
is  not  essential  to  the  argument  that  follows.  What  is  essential  is  that  this 
is  the  cumulated  difference  in  outcome  in  the  direction  of  the  agent's  gain. 
This  will  become  clear  in  Lemma  2. 


« . -i  . - *?  --  ■» 
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(5.2)  £*(Sn  - nc)  = £Sn  - ^ = 0 . 


By  the  definitions  of  N and  L , 

n 

SN  “ LN  _ NC  = ”bN  “ M ’ 


taking  the  expected  value  of  both  sides  of  the  above  yields 
(5-3)  -<%  - = ^>N  - M . 

Also,  since  the  sequence  (b  ) is  increasing,  £b„  = b_.  Putting  this  last 

n NT 

together  with  (5.2)  and  (5.3)  yields  the  conclusion  of  the  lemma 

Lemma  1 establishes  a limit  to  the  cumulated  expected  loss  that  the 
principal  can  suffer  up  through  period  N.  The  next  lemma  establishes  a 
corresponding  limit  on  the  agent's  gain.  Let  be  the  agent's  actual  action 
in  period  t,  and  let  A^  denote  what  his  action  would  be  if  he  used  the  decision 
function  a,  i.e. , At  = a(Zt).  The  corresponding  difference  in  the  agent's 
utility  is 

Dt  = U(w,  At)  - U (w,  At)  , 

if  the  agent  receives  the  payment  w.  The  agent's  total  gain  in  utility 
during  periods  1 through  n is 


where  y denotes  the  partial  derivative  of  y with  respect  to  its  first  argu 


ment.  In  addition,  Z is  bounded 
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imply  that,  for  some  number  M"  > 0, 


(5.5)  U2(w,  A)  = -M"  , for  all  A. 


Lemma  2.  If  the  principal  uses  the  trigger  strategy  Op((bn)),  then  a bound 
on  the  agent's  possible  expected  gain  in  utility  up  through  period  N is  given 


*=»  ; (ff-)  <fb»  + M)  'if)  <bT  + "> 


Proof.  By  the  concavity  of  y in  A, 


]t  - ct  - (At  - VW  V * 


(5.6) 


A -A 

c WV 


c - c c - c 

A - A <=  ^ ■ * -£ 1 

t Y1(At,  Zt)  M' 


The  last  inequality  follows  from  (5.4). 


Recall  that  U2  is  negative.  By  the  concavity  of  U,  and  by  (5.5)  and 


(5.6) 


Dt  = (At  " VU2(w’  V 


= (At  - At)M 


Hence 


>(i)  <i,  - v 

=n  : - V * 


i I 

V\ 


_ 
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i 


i 

j 


Define  the  sequence  (b°)  by 

n 

o " , A. 

b^  » - ess  inf  (C^  - c) 

(5.8)  b°  - - ess  inf  + C2  - 2c) 


Note  that  (b°/n)  approaches  zero  as  n Increases  without  limit. 

Define  B to  be  the  class  of  positive  sequences  (b^)  that  satisfy: 

(5.9)  bR  are  strictly  increasing,  and 


In  particular,  B contains  all  the  sequences  (Ab°)  with  X > 1. 

* A 

Let  v and  u denote  the  expected  one-period  utilities  of  the  principal  and 
agent,  respectively,  under  the  pair  (w,  a). 

Theorem.  For  any  e > 0 there  exists  a sequence  (b  ) in  B and  a T such  that 

n e 

the  pair  [a  ((b  )),  a.]  is  an  e-equilibrium  for  all  T = T , and  yields  the 

* It  A £ 

^ A 

principal  and  agent  average  expected  utilities  at  least  (v  - e)  and  (u  - e), 
respectively,  for  all  T. 


Let  v and  u denote  the  expected  one-period  utilities  of  the  principal 
and  agent,  respectively,  under  the  (Nash  equilibrium)  pair  (w*,  a*).  Consider 
a pair  [°p((b  )),  o.]  of  sequential  trigger  strategies,  with  (b  ) in  B.  The 


t 
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corresponding  average  expected  utility  to  the  principal  is 

(5.11)  ^ [(£N)v  + (T  -<?N)v*]  . 

(Use  the  martingale  systems  theorem  again.) 

“ > * 

Recall  that  v = v . Define 

6 = Prob  (N  < T)  ; 

Then  (5.11)  is  at  least  as  large  as 

\ 

(5.12)  (l-6)v  + 6v*  . 


This  is  as  large  as  v - e if 


(5.13) 


6 = 


v - v 


If  the  principal  were  to  switch  in  any  period  n to  a reward  function  other 

a 

than  the  constant  w,  then  in  that  period  and  thereafter  the  agent  would  opti- 
mize against  the  announced  reward  functions;  hence  in  periods  n through  T it 

k 

would  be  optimal  for  the  principal  to  use  the  reward  function  w . Hence  the 
principal's  optimal  response  to  the  agent's  strategy  is  to  use  the  constant 
reward  w in  all  periods.  The  resulting  average  expected  utility  to  the  prin- 
cipal is  v.  Therefore,  if  (5.13)  is  satisfied,  the  strategy  op((bn))  *s 
within  c of  being  optimal  against  o^. 

If  the  agent  follows  strategy  a A against  °p((bn))»  then  his  average 
expected  utility  is 

(5.14)  ^ [<&Uu  + (T  -<?N)u*]  . 

* > it 

Since  u • u , it  follows  that  (5.14)  is  not  less  than 


which  is  at  least  (u  - c)  if 


If  the  agent  uses  some  sequential  strategy  a instead  of  o against  the 
principal's  strategy  op((b  )),  then  his  average  expected  utility  is 


where  N is  the  stopping  time  under  the  agent's  strategy  o.  If  the  agent  uses 
a , his  average  expected  utility  is,  by  (5.15),  at  least 


The  increment  in  average  expected  utility  to  the  agent  from  using  o instead  of 
°A  *s  t*ieref°re  not  “ore  than  the  difference  between  (5.17)  and  (5.18),  which 


can  be  written  as 


Using  Lemma  2,  and  recalling  that  u • u 


Therefore,  the  proof  of  the  theorem  is  completed  by  taking  6 to  satisfy  both 
(^•13)  and  (5.21),  and  T^  to  satisfy  (5.22);  the  latter  is  of  course  possible 
because  (b„/T)  approaches  zero  as  T increases  without  limit. 


6.  Extensions 


In  the  model  set  out  in  Section  2,  it  was  assumed  that  the  principal's 
utility  is  a linear  function  of  the  outcome  of  the  agent's  action  and  the 
payment  to  the  agent.  With  a small  change  in  the  hypothesis,  the  theorem  of 
Section  5 remains  true  if  the  principal's  utility  is  a concave  function  of 
these  two  variables,  increasing  in  the  first  and  decreasing  in  the  second. 

The  required  change  is  that  the  constant  payment  w be  replaced  by  a possibly 
nonconstant  reward  function  u>.  With  this  more  general  hypothesis  about  the 
principal's  utility  function,  one  can  no  longer  guarantee  that  In  Pareto- 
optimal  arrangements  in  the  one-period  game  the  reward  function  is  constant 
(cf.  the  Proposition  of  Section  3). 

In  another  paper  (1979b)  1 have  examined  an  alternative  definition  of 
perfect  epsilon-equilibrium  in  which  the  utility  of  the  continuation  of  a 
sequential  strategy  is  the  average  utility  in  the  remaining  periods,  rather 
than  the  average  of  the  utilities  in  all  T periods.  This  change  makes  the 
definition  of  perfect  epsilon-equilibrium  more  restrictive,  in  the  sense  that, 
for  every  positive  epsilon,  the  set  of  perfect  epsilon-equilibria  is  smaller. 
In  the  application  referred  to,  one  could  show  that,  with  this  alternative 
definition,  cooperative  behavior  would  break  down  as  the  game  approached  the 
horizon  T.  This  conclusion,  which  is  in  accord  with  observation  and  common 
sense,  can  probably  be  extended  to  the  principal-agent  model,  but  I have  not 


carried  out  the  details. 
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